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BACKGROUND OF THE INVENTION 

A, Field of the Invention 

10 The present invention relates generally to hardware accelerators, and, more particularly to a 

hardware implementation of the pseudo-spectral time-domain.(PSTD) method. 

B. Description of the Related Art 

Since the advent of the modem computer, a great deal of effort has gone into the 
development of numerical algorithms for the rigorous solution of electromagnetic problems. Today, 

15 popular numerical approaches include the finite-element method (FEM), method of moments 
(MOM), modal expansion techniques, boundary integral methods, and time-domain methods such as 
the finite-volume time-domain (FVTD), multi-resolution time-domain (MRTD), and the finite- 
difference time-domain (FDTD) methods. Each of these algorithms possesses clear advantages and 
disadvantages depending on the specific application. One algorithm in particular, the pseudo- 

20 spectral time-domain (PSTD) method shows particular promise. Li comparison to the above 
methods, the PSTD technique can require far less memory while maintaining, and in fact improving, 
the accuracy and versatility of electromagnetic analysis. To this end, recent nimierical experiments 
have confirmed that, for a fixed amount of computational resources, the PSTD method can analyze 
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problems two to three orders of magnitude larger than an FDTD method with the same level of 
accuracy. 

However, one of the difficulties of the PSTD method is the large number of forward and 
inverse Fast Fourier Transforms (FFTs and IFFTs) that need to be computed, which significantly 
5 slow down the analysis. In comparison to the FDTD method, which has order N computational 
dependence, the PSTD method has order MogA^. Thus, from a pure software point of view, the 
PSTD method is far less appealing from a computational resources point of view. 

Over the last several decades, significant effort has been put into realizing application- 
specific integrated circuits (ASICs) for application to digital signal processing (DSP). As a result, 
10 ASICs are currently available that perform the FFT and IFFT operations in fractions of a 
microsecond. Thus, the PSTD method is far more attractive to implement in hardware than the 
FDTD method, due to the wealth of technology that it can leverage. While the FDTD method can 
also be realized in hardware, it suffers from the fact that it requires extraordinary amounts of 
computer memory. In comparison, the PSTD method can analyze problems two to three orders of 
15 magnitude larger than other fiiU-wave techniques, the FDTD method in particular, with the same 
level of accuracy. 

Despite the advantages of the FDTD acceleration hardware over software-based 
implementations, the FDTD method requires a tremendous amount of memory. This can greatly 
limit the size of the problems that are capable of being solved. Because the PSTD method requires 
20 fewer samples than the FDTD method (on the order of 1 000 times fewer), much larger problems can 
be solved with the PSTD method given the same amount of memory. 

Although PSTD methods are accurate and well defined, current computer system technology 
limits the speed at which we can perform these operations. To use the PSTD method to solve a non- 
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trivial problem can take hours, days, weeks, months, etc. Some problems are even too large to be 
effectively solved due to time constraints. 

Thus, there is a need in the art to overcome the limitations of the related art, and to provide 
for a practical, hardware implementation of the PSTD method. 

5 

SUMMARY OF THE INVENTION 
The present invention solves the problems of the related art by providing a hardware-based 
PSTD processor that capitalizes on the large advancements in the area of application DSP ASICs. 
By combining the PSTD algorithm with modem DSP chips and large scale FPGAs, the PSTD 

10 processor will be capable of solving very large radiation problems in computation times short 
enough to enable iterative design. Given this potential, a hardware implementation of the PSTD 
method offers the ability to design an extraordinary array of electromagnetic problems that 
heretofore have been impossible. 

Further scope of ^plicability of the present invention will become apparent from the detailed 

15 description given hereinafter. However, it should be understood that the detailed description and 
specific examples, while indicating preferred embodiments of the invention, are given by way of 
illustration only, since various changes and modifications within the spirit and scope of the invention 
will become apparent to those skilled in the art from this detailed description. It is to be understood 
that both the foregoing general description and the following detailed description are exemplary and 

20 explanatory only and are not restrictive of the invention, as claimed. 



BRIEF DESCRIPTION OF THE DRAWING 

The present invention will become more frilly understood from the detailed description given 
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hereinbelow and the accompanying drawing which are given by way of illustration only, and thus are 
not limitative of the present invention, and wherein: 

Fig. 1 is a schematic diagram showing a hardware implementation of the PSTD method in 
accordance with an embodiment of the present invention. 

DETAILED DESCRIPTION OF EMBODIMENTS OF THE INVENTION 

The following detailed description of the invention refers to the accompanying drawings. 
The same reference numbers in different drawings identify the same or similar elements. Also, the 
following detailed description does not limit the invention. Instead, the scope of the invention is 
10 defined by the appended claims and equivalents thereof. 

Equations in the PSTD method take the following form: 

E„,^AE^+B^+CE^ (1) 

where 6, and c are directions (x, y, or z). A, B, and C are coefficients based on the material 
properties of the medium, and E^j^ is the incident field associated with the node. 

15 Unlike the finite-difference time-domain (FDTD) method, where both spatial and temporal 

derivatives are represented by finite differences, the PSTD method uses FFTs and IFFTs to calculate 
the spatial derivatives. For example, recall fi-om basic firequency-domain calculations that the spatial 
derivative of a field, Hz^ in the ;;-direction can be written as: 

^ = IFFrO*k/FFT(H/i:^)) (2) 
dy 

20 To compute the spatial derivative of Hz in the;;-direction for some node k), all of the 

values of Hz in the j;-direction along the line (i,...,^) are required. Although this can require a 
tremendous amount of data to be fetched (depending on the mesh size), this operation only needs to 
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be performed once for all values that require this derivative along this line. If there are iV^ values in 
thej^-direction, by fetching N values from memory, the spatial derivative in the^-direction for N 
different mesh points may be computed. Although there is a significant latency associated with the 
fetch operation, this operation only needs to be performed once per derivative, per timestep. 
5 Despite the large amount of data required to perform the FFT/IFFT operations, it is possible 

to efficiently organize RAM to maximize throughput. For example, if the values of Hz were stored 
in RAM in increasing-;; order, it would be possible to perform a burst-read from RAM. Bursting 
allows the RAM to fetch contiguous locations very rapidly. Thus, burst reads allow fetching of all of 
the values necessary to compute the derivative, while, at the same time, maximizing the throughput 

10 of the RAM. Because spatial derivatives are required in the x.y, and z-directions, values would need 
to be stored in the RAM in three different pattems (increasing-jc, increasing-;^, and increasing-z). 
This would permit taking advantage of the bursting capabilities when computing any required 
derivative. Similarly, there will be three different "update" orders. The update order specifies the 
order in which nodes in the mesh will be updated. Updating nodes in the same order in which they 

15 are stored in RAM allows reuse of the spatial derivatives. Once the spatial derivative of a field is 
known in a specific direction, that value can be used to update any node along that line. Thus, by 
solving fields along that line, the same value can be used over and over again without incurring the 
latency of the RAM fetchmg and the FFT/IFFT operations. 

Fig. 1 shows one logical data flow for a PSTD accelerator 10 in accordance with the present 

20 invention. Note that there are three parallel datapaths 12, 14, 16. These datapaths 12, 14, 16 are 
completely independent of one another (except for a memory subsystem 1 8) and each solves fields in 
a given direction (x, y, or z). Each of these datapaths 12, 14, 16 is responsible for solving an 
equation of the form shown in equation 1. 
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The flow within each computational path begins by determining the spatial derivative needed 
for the computation. The necessary values must be fetched from the memory subsystem 18 and 
streamed into an FFT unit 20 that performs the FFT on the values. Depending on the size of the 
problem being solved and the capabilities of the FFT unit 20, this computation can easily require 
5 several thousand cycles. While the FFT is being computed by the FFT unit 20, the other necessary 
data, including the primary fields, incident fields, and coefficients, can all be fetched from memory 
subsystem 18 or computed. This entire computational path will be pipelined to hide much of the 
latency of the FFT operation. As results begin emerging fi'om the FFT unit 20, the results are 
streamed through a pipelined a complex multiplication unit 22 that solves the multiplication aspects 

10 of equation (2) set forth above. At this point, the spatial derivatives will have been computed in the 
firequency domain. The next step in each datapath 12, 14, 16 is to convert the frequency domain 
result back into the time domain by means of an IFFT unit 24 that performs an IFFT. IFFT unit 24 
will also undergo a several thousand-cycle latency. As results begin emerging from the IFFT unit 
24, the resuhs are streamed into a Computation Engine (CE) 26. The CE 26, given the necessary 

15 data and the spatial derivative, solves equation (1) set forth above. Once complete, the fields are 
written back to the memory subsystem 1 8 by CE 26. 

It will be apparent to those skilled in the art that various modifications and variations can be 
made in the hardware implementation of the PSTD method of the present invention and in 
construction of the hardware without departing fi-om the scope or spirit of the invention. 

20 For example, although Fig. 1 shows three parallel computational datapaths 12, 14, 1 6, more 

or less such datapaths may be provided. Fewer datapaths may be implemented to save on hardware, 
or extra datapaths may be included to increase parallelism. The actual number of datapaths is mostly 
dependent upon the capabilities of the memory subsystem 18. 
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Also, the FFT and IFFT units 20, 24 may or may not be co-located with complex 
multiplication unit 22 and CE unit 26. FFT and IFFT units 20, 24 may be implemented inside of a 
field-programmable gate-array (FPGA). However, custom FFT/IFFT chips may be purchased or 
DSP chips may be programmed to solve the FFT/IFFT equations. The latter configurations require 
5 DSP chips external to the FPGA. This would result in a more complex printed circuit board and 
latency in transferring data into and out of the FPGA. However, DSP chips are extremely fast and 
would save FPGA resources. However, with very advanced FPGAs already in the market, an 
implementation where the FFT/IFFT operations are performed inside the FPGA is quite possible. 

By implementing the PSTD method in hardware, computational speedup is achieved that 
10 allows solving problems much faster than current software-based methods and also allows solving 
problems that were heretofore unsolvable due to time constramts. The hardware implementation of 
the PSTD method of the present invention may be used in any field or application utilizing a 
software method based on a pseudo-spectral time-domain approach. 

Other embodiments of the invention will be apparent to those skilled in the art from 
15 consideration of the specification and practice of the invention disclosed herein. It is intended that 
the specification and examples be considered as exemplary only, with a true scope and spirit of the 
invention being indicated by the following claims. 
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